Model Selection

English Visual Understanding

# English Visual Understanding

VL Rethinker 7B 6bit

This is a multimodal model based on Qwen2.5-VL-7B-Instruct, supporting visual question answering tasks, converted to MLX format for efficient operation on Apple chips.

Transformers English

Brahmai Clip V0.1

CLIP model based on ViT-L/14 and masked self-attention Transformer for zero-shot image classification research

Transformers English

brahmairesearch

UForm-Gen is a small generative vision-language model primarily used for image caption generation and visual question answering.

Transformers English

Hashtaggenerater

Flickr30k is an English dataset for image-to-text tasks, commonly used for training and evaluating image caption generation models.

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase